home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
EnigmA Amiga Run 1996 March
/
EnigmA AMIGA RUN 05 (1996)(G.R. Edizioni)(IT)[!][issue 1996-03][Skylink CD IV].iso
/
earcd
/
util2
/
ftype.lha
/
ftype.doc
< prev
next >
Wrap
Text File
|
1996-01-03
|
11KB
|
242 lines
Ftype by Robert Dick (dickrp@wckn.dorm.clarkson.edu)
/*============================================================================*/
This program can be distributed with the following conditions:
1) It is in its original form; all the files in this archive are included and
unchanged.
2) One may not charge more for it than the price of the media being used in its
distribution. This means that one can "sell" a disk containing this program for
no more than the price of a blank disk.
3) Special case - inclusion in Aminet CD collections is allowed.
/*============================================================================*/
Name: ftype
/*============================================================================*/
Version: 1.0
/*============================================================================*/
Tested on:
Amiga 3000-25
4 megs fast, 2 megs chip
Seagate ST32550N drive
OS 3.1
/*============================================================================*/
Function:
This program determines a file's type based on past observances of similar
files. It can perform an action on the file based on the file's type.
/*============================================================================*/
Reason:
Before ftype, it was necessary to update file recognizers as new file types came
into usage. One had to go find a "brain file" or specify information from de
tailed personal knowledge of the new file type. Recognizers in the past often
(almost always) relied on a magic tag at near the start of a file to identify a
file.
Ftype learns how to recognize a new file without requiring you to research the
file format or go hunting for a brain file. It takes advantage of magic numbers
but it doesn't rely on them. This means that you can use ftype to differentiate
between C source files and C++ source files if you want to. Of course, if two
files are extremely similar, ftype will have difficulty differentiating between
them.
/*============================================================================*/
Disclaimer:
I don't have any Amigas other than my trusty A3000 to test the system with so if
it makes your machine explode, tell me so I can fix the program.
/*============================================================================*/
My results:
I train the system with 2 instances of each of the following file types:
ILBM, C source, JPEG, LHA, song lyrics, SAS/C object files, PostScript
When tested on 8 instances of each type, the system has a 100% recognition rate.
/*============================================================================*/
Usage:
ftype -k <type>: Kill a file type.
ftype -t [-v] <type> <file>: Train.
Trains the network to recognize the specified file as a member of the specified
type.
ftype -h: Help.
Prints out a list of flags that can be used with ftype.
ftype -p: Print network.
Prints out the neural weights for each file type. This is useful to me for de
bugging purposes.
ftype -l: List file types known.
Shows the names of the file types the network has been trained with and tells
how many times each file type has been trained.
ftype [-ivqadx] <file>: recognize file
Determines which type the specified file is and takes other actions bases on the
flags given on the command line.
-i: Implicit training.
When a file type is strongly identified, it is used to refine the network.
-v: Verbose.
Tells how far from each file type known by the network the given file is.
-q: Quiet.
Doesn't print out the type of the file recognized. This is useful for shutting
ftype up when you just want to have it take an action with -d or -x.
-a <number>: Specify an accuracy 1 to 5.
Higher numbers result in ftype being slower but more accurate. It is not recom
mended that implicit training be done with low accuracy values.
-d: Carry out the default action.
Executes the default action for this file type. See the "Actions" section for
more information.
-x <action>: Carry out the specified action.
A more flexible version of -d. See the "Actions" section.
/*============================================================================*/
Environment Variables:
Ftype uses three environment variables.
FTYPE_ACCURACY - This optional variable should hold a number from 1 to 5. It
specifies the accuracy ftype will use when identifying files. It is overridden
by the -a command line flag. If this environment variable does not exist and no
accuracy is specified on the command line, ftype defaults to and accuracy of 3.
See the "Usage" section for more information.
FTYPE_DATADIR - This necessary variable should hold the directory in which ftype
will locate its data file, "ftype.dat".
Example: sys:ftype/
FTYPE_ACTIONDIR - This variable is necessary if the -d or -x flags are to be
used. It specifies a directory in which ftype's action description files can be
found. See the "Actions" section for more information.
Example: sys/ftype/actions/
/*============================================================================*/
Actions:
Ftype is capable of carrying out actions based on the type of a file that it
recognizes. This can most easily be illustrated by giving an abstract example.
You have trained ftype to recognize three file types: image, sound, source. You
find that you often carry out the same abstract operation on each of these three
file types, viewing. You find yourself, however, using three different commands
to view these files. You can use ftype to handle these details for you.
1) Specify ftype's "action" directory with the FTYPE_ACTIONDIR environment vari
able.
2) Create a file called "image" (the same name as the file type you have taught
ftype). In this file, specify the names of different actions, one of which may
be the default action, and associate a command with each one. Do the same for
"sound" and "source".
Now, when you type ftype -x view main.c, ftype figures out what main.c is and
carries out the appropriate viewing operation.
/*============================================================================*/
Action File Format
The name of each action appears at the start of a line. The command line (what
you would type into the shell) appears on the same line, separated by
whitespace. The action with the name "default" will be carried out when the -d
command line flag is passed to ftype. The character '@' is used to cause the
name of the file being identified to be inserted. Two consecutive '@'s will re
sult in the placement of a single '@'.
Example for a text file type:
default run >NIL: muchmore C=000,fff,83E,F72 @
view run >NIL: muchmore C=000,fff,83E,F72 @
edit run >NIL: ed [] window=con:0/0/724/239/Ed/close @
junk mv @ sys:trash/old_text
print printfiles @
/*============================================================================*/
Alias Trick
If you get tired of typing something like "ftype -q -x view dragon.iff" every
time you want to use ftype to view a file, you can add a few lines to your
shell-startup to make things more transparent.
Example:
alias fview "ftype -q -x view []"
alias fdo "ftype -q -d []"
alias fedit "ftype -q -x edit []"
/*============================================================================*/
(Basic) Theory:
One may view the master neurons in ftype, one of which is associated with each
file type, as irregular regions on a map. Each file occupies a region of the
map and it is the purpose of each neuron to fit itself to it's own file type.
The neuron only knows about a limited number of instances (dots on the map) of
its file type (the ones you have trained it with) so it must make assumptions
about the shape of its file type region's border. Sometimes one neuron will en
croach on another's file type region. As you train the system, you may find
that it occasionally punishes a neuron. Punishment forces a neuron to selec
tively retract from another neuron's file type region.
What I just told you isn't quite true. The "map" is a non-linear combination of
60, 40 and 2 dimensional regions. The file type regions are 102 dimensional
blobs. I'm amazed that the program works at all.
/*============================================================================*/
Changes:
Version 0.9
First version
Version 1.0
Fixed infinite punishment bug.
Thanks to James Atwill and Magnus Holmgren for the bug reports.
Fixed neuron creation memory allocation bug.
A big thanks to Brian Mury for catching this one. I can't believe I let it slip
through.
Allows user to select the speed (and accuracy) of feature extraction.
Thanks to Meni Berman with his slower A1200 for the suggestion.
Lets user list file types known and the number of times each has been trained.
File types can now be killed.
Now uses (and needs) environment variables.
Changed the meaning of the -d flag.
Lets user associate actions with a file type (big change).
I made major changes to the neural network's learning schedule. This should re
sult in faster and more accurate learning but, unfortunately, it was necessary
to change the format of the data file.
/*============================================================================*/
Write Me:
If you have suggestions regarding improvements you would like to see or bug re
ports, write me. I'm also very interested in hearing about how well ftype works
for you. What file types does it recognize? What accuracy are you getting out
of it? Please write me. I only have a limited number of files I can test it
with.
/*============================================================================*/
Suggestions I probably won't do anything about:
I have received a number of helpful suggestions from the users of ftype 0.9.
There are a few recurring requests that I probably won't be doing anything
about.
1) Right now, ftype compiles under any system with an ANSI C compiler. I am re
luctant to make any change that will tie it to one platform. You may wonder why
I'm distributing ftype for the Amiga and for no other platform. I have always
had much better luck getting helpful suggestions and coherent bug reports from
Amiga users than those of more popular "commodity" computer systems.
2) A few people have asked for control over the structure of the neural network
used in ftype. This would defeat one of the main purpose of the program; sim
plicity from the user's perspective. It's tedious, difficult, and not at all
fun to set up a functional neural network and training schedule. If you enjoy
this sort of thing, see a psychiatric doctor.
Robert Dick (dickrp@wckn.dorm.clarkson.edu)